Efficient Text Classification Using Tree-structured Multi-linear Principle Component Analysis
نویسندگان
چکیده
A novel text data dimension reduction technique, called the tree-structured multi-linear principle component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principle component analysis (PCA). Furthermore, it is demonstrated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.
منابع مشابه
Multi-label Classification of Product Reviews Using Structured Svm
Most of the text classification problems are associated with multiple class labels and hence automatic text classification is one of the most challenging and prominent research area. Text classification is the problem of categorizing text documents into different classes. In the multi-label classification scenario, each document is associated may have more than one label. The real challenge in ...
متن کاملText Mining and Classification of Product Reviews Using Structured Support Vector Machine
Text mining and Text classification are the two prominent and challenging tasks in the field of Machine learning. Text mining refers to the process of deriving high quality and relevant information from text, while Text classification deals with the categorization of text documents into different classes. The real challenge in these areas is to address the problems like handling large text corp...
متن کاملComparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملExploring Gördes Zeolite Sites by Feature Oriented Principle Component Analysis of LANDSAT Images
Recent studies showed that remote sensing (RS) is an effective, efficient and reliable technique used in almost all the areas of earth sciences. Remote sensing as being a technique started with aerial photographs and then developed employing the multi-spectral satellite images. Nowadays, it benefits from hyper-spectral, RADAR and LIDAR data as well. This potential has widen its applicability in...
متن کاملA Mixtures-of-Experts Framework for Multi-Label Classification
We develop a novel probabilistic approach for multi-label classification that is based on the mixtures-of-experts architecture combined with recently introduced conditional tree-structured Bayesian networks. Our approach captures different input-output relations from multi-label data using the efficient tree-structured classifiers, while the mixtures-of-experts architecture aims to compensate f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.06607 شماره
صفحات -
تاریخ انتشار 2018